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Abstract. In the following paper, we present a consistent Newton-Schur solution approach for 
variational multiscale formulations of the time-dependent Navier-Stokes equations in three dimen- 
sions. The main contributions of this work are a systematic study of the variational multiscale 
method for three-dimensional problems, and an implementation of a consistent formulation suit- 
able for large problems with high nonlinearity, unstructured meshes, and non-symmetric matrices. 
In addition to the quadratic convergence characteristics of a Newton-Raphson based scheme, the 
Newton-Schur approach increases computational efficiency and parallel scalability by implement- 
ing the tangent stiffness matrix in Schur's complement form. As a result, more computations are 
performed at the element level. Using a variational multiscale framework, we construct a two-level 
approach to stabilizing the incompressible Navier-Stokes equations based on a coarse and fine- 
scale subproblem. We then derive the Schur's complement form of the consistent tangent matrix. 
We demonstrate the performance of the method for a number of three-dimensional problems for 
Reynolds number up to 1000 including steady and time-dependent flows. 



1. INTRODUCTION 

The incompressible Navier-Stokes equations are used to model a number of important physical 
phenomena including, pipe flow, flow around airfoils, ocean currents, weather, and blood flow in 
an artery, among others. Significant emphasis has been placed in the literature on developing sta- 
bilized formulations robust enough to model complex flows at high Reynolds number [H El E] • The 
variational multiscale method, due to Hughes [1], has been gaining popularity as a robust stabi- 
lization technique. This method consists of decomposing the problem into a coarse (or resolvable) 
and fine-scale (unresolvable) subproblem. The inclusion of the fine-scale influence stabilizes the 
solution. 
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The variational multiscale method has been well studied for two-dimensional problems, but less 
so for three-dimensional problems. Also, three-dimensional effects become more influential as the 
Reynolds number increases. It is well known that two-dimensional analysis cannot accurately 
capture flow features for high Reynolds flows. For example, it is well known that for the backward 
facing step problem at Reynolds number greater than 450, the reattachment length obtained by 
two-dimensional analysis does not match experimental results [S]. One of the main contributions 
of this paper is a systematic study of the variational multiscale method for three-dimensional 
problems. To solve three-dimensional problems requires many more degrees of freedom. Using a 
mixed formulation, in which velocity and pressure are treated as unknowns, the number of degrees of 
freedom easily exceeds several million. Another contribution of this work is the Schur's complement 
implementation which greatly reduces the problem size and improves parallel performance. 

The main challenge to numerically modeling the incompressible Navier-Stokes equations using 
the finite element method is due to the well known instability associated with using equal order 
interpolation for the velocity and pressure (which is computationally most convenient). Many 
stabilized methods for the incompressible Navier-Stokes equations have been developed that allow 
for equal order interpolation including [H [71 [8l HI [9l [TOl- Stabilized formulations that employ a 
variational multiscale framework have been gaining popularity. First introduced as a pathway 
to developing stabilized formulations from first principles, the variational multiscale concept has 
been used in a number of contexts, particularly for the incompressible Navier-Stokes equations 

[iiiiiaiiiiiiiiiTiiisiiiiiin]. 

One can either solve the fine-scale problem in terms of the coarse-scale variables, thereby gener- 
ating a stabilization term, and substitute back into the coarse-scale problem, or solve both problems 
simultaneously. Typically a fixed-point iteration technique is used to linearize the weak form of 
the governing equations. We have shown in [12] that by using a consistent Newton-Raphson based 
linearization, quadratic convergence may be achieved and that convergence may be achieved for 
problems for which fixed-point iterative methods diverge or converge very slowly. In this work, we 
present a Schur's complement implementation of the consistent Newton-Raphson based lineariza- 
tion of incompressible Navier-Stokes equations that provides for good computational efficiency and 
scalability. We extend the work presented in [12] to large, nonlinear, three-dimensional problems 
on unstructured meshes. 
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Using the Schur's complement implementation of consistent formulations has been used primarily 
in the field of domain decomposition techniques and computational plasticity |18l I19j . The Schur's 
complement implementation requires using block Gauss elimination to eliminate the fine-scale terms 
from the consistent tangent matrix. The resulting tangent matrix contains modified coarse-scale 
terms that take into account the fine-scale influence. If needed, the fine-scale terms can be obtained 
via post-processing. The Schur's complement form of the consistent tangent matrix reduces the 
problem size, and provides for much greater parallel scalability. 

Primary emphasis in this paper is placed on solving several canonical three-dimensional problems 
for which experimental and numerical results are available in the literature. We show that using 
the Newton-Schur approach not only improves convergence, but also allows for much more efficient 
use of parallel resources. In the first section, we introduce the variational multiscale framework and 
describe the decomposition of the problem into a coarse and fine-scale subproblem. In the next 
section, we present the consistent linearization of the Navier-Stokes equations and incorporate the 
Schur's complement form of the consistent tangent matrix to complete the Newton-Schur solution 
approach. We then present a number of numerical examples that illustrate the advantages of the 
consistent Newton-Schur solution approach. 

2. GOVERNING EQUATIONS 

Let be a bounded open domain, and F be its boundary, which is assumed to be piecewise 
smooth. Let the velocity vector field be denoted by d : 17 — > M""', where "nd" is the number of 
spatial dimensions. Let the (kinematic) pressure field be denoted by p : ^ M. As usual, T is 
divided into two parts, denoted by T'" and T*, such that F'^nF* = and F'^UF* = F. F'", and F* are 
the Dirichlet and Neumann boundaries respectively. The governing equations for incompressible 
Navier-Stokes flow can be written as 

(1) V -Vv - 2vV^v + \/p = b in n 

(2) V • u = in 17 

(3) v = vP on F'" 

(4) -pn + u{n ■ V)v = on F* 

where v is the velocity, p is the kinematic pressure (pressure divided by density), V is the gradient 
operator, is the Laplacian operator, b is the body force, > is the kinematic viscosity, is 



the prescribed velocity vector field, is the prescribed traction, and n is the unit outward normal 
vector to F. Equation ([T|) represents the balance of linear momentum, and equation ([2]) represents 
the continuity equation for an incompressible continuum. Equations ^ and (jH) are the Dirichlet 
and Neumann boundary conditions, respectively. 

3. VARIATIONAL MULTISCALE FRAMEWORK 

It is well known that the classical mixed formulation for the incompressible Navier-Stokes equa- 
tions does not produce stable numerical results. The instability of the standard Galerkin formula- 
tion may be mathematically explained by the LBB inf-sup stability condition [20\ [1] . To get stable 
numerical results, one must stabilize the standard Galerkin formulation. The variational multiscale 
concept, which stems from the pioneering work by Hughes [1], decomposes the underlying fields 
into coarse or resolvable scales and subgrid or unresolvable scales. 

3.1. Multiscale decomposition. Let us divide the domain into N non-overlapping subdomains 
0,^ (which in the finite element context will be elements) such that 

N 

(5) n=[jn'' 

e=l 

The boundary of the element $7"^ is denoted by F*^. We decompose the velocity field v{x) into 
coarse-scale and fine-scale components, indicated as v{x) and v'{x), respectively. 

(6) v{x) = v{x) + v'{x) 

Likewise, we decompose the weighting function w{x) into coarse-scale w{x) and fine-scale w'{x) 
components. 

(7) w{x) = w{x) + w' (x) 

We further make an assumption that the fine-scale components vanish along each element boundary. 

(8) v'{x) = w'{x) = on F*" ; e = 1, . . . , 

Let V be the function space for the coarse-scale component of the velocity v, and W be the function 
space for w; and are defined as 

(9) V := {v\v £ {H\Q))'''^,v = vP onT"} 

(10) W := {w \w e {H^{n))'"^,w = 011^} 
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Let V' be the function space for both the fine-scale component of the velocity v' and its corre- 
sponding weighting function w', and is defined as 



(11) V' ■.= {v\ve {H^{n^)r^, V = OonT^,e = l,...,N} 

In theory, we could decompose the pressure field into coarse-scale and fine-scale components. How- 
ever, for simplicity we assume that there are no fine-scale terms for the pressure p{x) and for its 
corresponding weighting function q{x). Hence, the function space for the fields p{x) and q{x) is V 

(12) V := {p \ p e L\n)} 

where L^(r2) is the space of square-integrable functions on the domain Q. For further details on 
function spaces refer to Brezzi and Fortin |20| . 



3.2. Two-level classical mixed formulation. After substitution of equations ([6]) and ([7]) into the 
classical mixed formulation and decomposing the problem into a coarse and fine-scale subproblem, 
the resulting weak form can be written as two parts. The coarse-scale problem can be written as: 

(13) {w, {v + v') ■ V{v + v')) + {Vw, 2uV{v + v')) - {V ■w,p) = {w, b) + {w, h)rt V w G VV 

(14) {q,V ■ {v + v')) = yqeV 

The fine-scale problem can be written as: 
(15) 

{w', {v + v') ■ V{v + v')) + {Vw', 2i/V('j) + v')) - (V • w',p) = {w', b) + {w' , h)r, y-w'eW 

For convenience, we define the inner-product over a spatial domain, K, as 

(16) {a,b)K = / a-bdK 

Jk 

The subscript K will be dropped if K is the whole of 0,, that is K = 0,. Notice that the weak form 
is nothing more than the classical mixed formulation written for the coarse and fine-scale varaibles, 
with the exception that there is no weak form of the fine-scale incompressibility constraint. For a 
more detailed derivation of the weak form, see |12j . 
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3.3. Fine scale interpolation and bubble functions. If one chooses a single bubble function 
for interpolating the fine-scale variables (similar to the MINI element [21])) then we have 



(17) v' = b^f3; w' = b^j 

where is a bubble function, and f3 and 7 are constant vectors. The gradients of the fine-scale 
velocity and weighting functions are 

(18) V^;' = pVb^^; Vw' = 7V6^^ 
where V6*^ is a dim x 1 vector of the derivatives of the bubble function. 

4. CONSISTENT NEWTON-SCHUR SOLUTION STRATEGY 



For the consistent Newton-Schur method, we treat equations (|13p - ()15p as global residuals and 
obtain the solution using an iterative update equation. 

4.1. Vector residual. First, we express the solution field and its weighting function in terms of 
nodal values v = v ^AT^ and w = w'^N'^ , respectively (where AT is a row vector of shape functions 
for each node). After substituting these expressions into the global residuals, the vector residuals, 
R, that are the sum contributions of the vector residuals at the element level, R^, can be written 
as 

Rl{v; v',p) := / {N'^ I){{v + v') ■V){v + v') dO 
Jn 

+ 2v f {{DN)J'^ Q I)vec[Vv + Vv'] 
Jn 

(19) - [ vec[J~^{DNf]pdn- [ {N'^Ql)bdn- [ {N^ Q I)h dT 

Jn Jn Jvt 

(20) Rliv; v') ■■=- [ N^V ■ {v + v') dQ 

Jn 

R){v] v',p) := 1 b^I{{v + v') ■V){v + v') dn+ 2v j (V6'=^ I)yec[Vv + Vv'] d^ 

(21) - / (V6^^ I)vec[I]p dO - / bnbdn 

where the subscripts 'c', 'p', and '/' stand for coarse, pressure, and fine, and DN represents a 
matrix of the first derivatives of the element shape functions, which is defined for a triangular 
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element as 



(22) 



DN 



dNi dNi 



dN3 dN3 



J is the element jacobian matrix, vec[-] is an operation that represents a matrix with a vector, 
is the Kronecker product [22]. 



Remark 1. For transient problems, we simply add the time derivative of the velocity to equation 
([T|). Using the backward Euler method to integrate in time from step n to step n + 1, we add the 
following terms, -^{N'^ Ql)vn+i d^l — J^ -^{N'^Ql)vn and J^^ ^Ivn+i d^l — J^^ ^^^n dJl 
to equations ()19p and (|2ip , respectively, where Vn "is the converged velocity from the previous time 
step. For furhter details, see [12] . 



4.2. Tangent matrix. Using a Newton-Raphson type approach, we obtain the solution in an iter- 
ative fashion using the following update equation until the residual is under a prescribed tolerance, 

(23) v'+^ = v' + A^^ ; p'+^ = f + Ap^ ; v''+^ = v'' + A^'* 



where the updates at each iteration, i are calculated from the following system of equations 



(24) 



DHc DKc DHc 

Di3 Dp Dr;' 

DRp DRp DKp 

Di3 Dp Dri' 

DRf DRf DRf 

IW T5?r T3i7" 





Av 




^"1 


< 


Ap 


H 










ill 



The element matrices, which are assembled to form the consistent tangent matrix, are pre- 

sented in [T2I. 



5. SCHUR'S COMPLEMENT IMPLEMENTATION 

By applying block Gauss elimination to equation (j24p . we obtain the Schur complement repre- 
sentation of the update equation. 



(25) 



pv ^pp 



Av ' 


hi 


^1 


Ap ^ 




R2 
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where each of the stiffness terms, K, represent the assembled element contributions as follows: 

Nele 



K 



vp 



e=l 
Nele 

A 

e=l 

Nele 



K 



pv 



(26) 



K 



pp 



And the residuals are given as 



(27) 



Ri 



R2 



A 

e=l 
Nele 

A 

e=l 
Nele 

A 

e=l 
Nele 

A 

e=l 





BRl 




BRj 




Bv' 


Bv' 


Bv 


'dri 


BRl 


\BRy 


^^BR) 


Bp 




Bv' 


Bp 


'dr; 


br; 


\BRy 


~'BR) 


Dv 


Bv' 


Bv' 


Bv 



BRl 
'~Bv' 



Rl 



BR^ 
Bv' 



BR^ 

Bp 



Rl 



BK 
Bv' 



br; 

Bv' 



BR^ 
Bv' 



BR 



Bv' 



From equation [25] we update the coarse-scale velocity and pressure. We then solve for the fine-scale 
increment from 



(28) 





-1 - 


Bv' 





R 



BR 



f 



Av 



BR 



Ap 



Bv Bp 

Again, the coarse and fine-scale velocity and pressure are updated until convergence. In addition 
to faster convergence properties obtained for the Newton-Raphson type solution procedure, the 
Schur complement representation increases parallel performance by taking advantage of the fine- 
scale terms vanishing on the boundary of the element. The fine-scale computations can therefore 
take place at the element level without the need for inter-processor communication. 

6. NUMERICAL RESULTS 

The above formulation was implemented in C+-|- on the Turing cluster which consists of 768 
Apple Xservers, each with two 2 GHz G5 processors and 4 GB of RAM, for a total of 1536 processors 
|23| . The cluster is connected using a high-bandwidth, low-latency Myrinet network from Myricom. 
The operating system used is Mac OS X Server, version 10.3. The data structures were implemented 
in parallel using the Portable, Extensible Toolkit for Scientific Computation (PETSc) [23] Vec and 
Mat objects (note that the Vec object is different than the vec[-] operator). PETSc, in turn uses 



MPICH for parallel communication. Linear solutions for each iteration were obtained using the 
iterative Krylov subspace solver, KSP, provided by PETSc. The tolerance was set to IE- 12 and 
the maximum number of iterations was set to 50. The code aborts if the max number of iterations 
are reached without meeting the tolerance. 

Between one and 128 processors were used for each simulation and the results were visualized us- 
ing Tecplot 360 [25] and Visit 1.8.1 The mesh was partitioned using a simple block partitioning 
strategy. The parallel results may be improved with the use of a more sophisticated partitioning 
algorithm, but that is beyond the scope of this work. In order to avoid deadlock, matrix assembly 
routines are called by each process after all of its elements have been computed. 

6.1. Body force driven cavity. Using the body force driven cavity problem, we compare the 
speedup on a single processor between a consistent formulation using the Schur's complement 
implementation and one without. The problem description, geometry, and boundary conditions for 
the body force driven cavity problem are given in [12]. An exact solution is derived for a given body 
force. The results for both the Newton- Raphson approach, presented in |12j . and the Newton-Schur 
approach, presented here, are shown in Figure [H Figure [Tj shows the velocity in the x-direction 
along the center of the cavity as y increases from the bottom to the top. Notice that the results 
are identical for both methods. 

Figure [2] compares the computational time required by the Newton-Raphson and Newton-Schur 
approaches as the number of processors increases. Notice that for 32 processors, the computational 
costs are reduced by over 40%. Since the computational advantages of the Schur's complement 
implementation are based largely on parallel features, the benefits of the Newton-Schur approach 
grow as the number of processors increases. 

6.2. Three-dimensional lid-driven cavity. The geometry and boundary conditions for the 
three-dimensional lid-driven cavity are shown in Figure [3j A unit velocity is prescribed in the 
positive x-direction across the top of a cube of unit volume. The viscosity is adjusted to ob- 
tain higher Reynolds numbers. Steady solutions to the three-dimensional lid-driven cavity have 
only been shown to exist for Reynolds number less than 2000. Experimental studies of the three- 
dimensional lid-driven cavity show a number of unsteady motions for Re > 2000, for example eddies 
and Gortler vortices |27j . 
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Figure m shows the velocity in the x-direction along the centerline of the cavity along the height 
of the cavity. Notice that the results correspond well with those reported in [5]. Using the Newton- 
Schur solution approach we were able to obtain steady state solutions up to Re = 865. Above 
Re = 865 the Newton-Schur solution approach diverges. In [5] steady-state solutions are obtained 
for Re = 1000 using a velocity-pressure-vorticity formulation and a time-marching scheme. To our 
knowledge, this paper presents the first results for high Reynolds flows for the lid-driven cavity 
without the use of a time-marching scheme. As the Reynolds number increases, the instabilities 
created by boundary layer flow over concave surfaces, like the corners of the cavity, begin to take 
effect. These instabilities represent the transition into turbulent flow [28]. The failure to converge 
above Re = 800 for the Newton-Schur solution strategy could be due to the presence of Taylor- 
Gortler vortices as reported in [5]. 

Taylor-Gortler vortices occur when the velocity profile approaches zero at the boundary over 
a concave (and in some cases convex) surface. They are the result of a centrifugal instability. 
Rayleigh first recognized this instability in 1916 and showed that as the distance increases radially 
from the center point of a curved surface, the velocity must also increase, otherwise an inviscid 
axisymmetric instability occurs. In the case of the corners of the three-dimensional cavity, the flow 
decreases due to friction with the walls causing a decrease in velocity with radial distance. 

Figure [5] shows the results of the three-dimensional lid-driven cavity problem for Re = 800 which 
were obtained by direct solution of the steady state Navier-Stokes equations using the Newton- 
Schur solution approach. Notice the presence of the Taylor-Gortler vortices shown in the bottom 
corners of the z-y profile. The presence of these vortices at Re = 800 may suggest that the transition 
to turbulent flow may be occurring well before Re = 2000. Notice the similarity between the results 
presented in Figure [5] and those reported for Re = 1000 in [5] which also suggests that unstable 
flow may be occurring at a lower Reynolds number than previously reported. 

6.3. Three-dimensional jet flow from an orifice. The three-dimensional jet flow from an 
orifice problem represents the laminar evolution of vortices formed by flow through a small hole in 
an infinite no-slip wall. The problem description and boundary conditions are shown in Figure [H 
As the flow is injected into the domain, a mushroom like vortex forms which grows in time. The 
viscosity was given as 0.001, which corresponds with a Reynolds number of 1000. A time step of 
0.01 was also used which is typical for similar problems. 
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Figure [7] shows the pressure contours at time t = 1.0s. A section of the contours have been 
removed in order to view the inside of the vortex. Figure [8] shows the velocity magnitude contours 
and the streamtraces for the three-dimensional jet problem. Notice that the streamtraces are 
similar to those from the two-dimensional analysis found in [3l [12]. Notice that the pressure 
contours are moving forward and spreading out. A large region of high pressure forms in front of 
the spreading region of low pressure. The low pressure region corresponds with the vortex shown 
in the streamtraces of Figure [8l 

In Figure [9] we show a comparison between the two-dimensional results found in [12] and the 
three-dimensional results presented herein. Notice that for both pressure and velocity the results 
do not match closely suggesting that three-dimensional effects are present for the jet flow problem. 

6.4. Parallel speedup and isoefRciency. To explore the parallel features of the Newton-Schur 
solution approach a parallel performance study was performed. The lid-driven cavity problem was 
solved using a variety of mesh sizes and number of processors. The boundary conditions for the 
three-dimensional lid-driven cavity are shown in Figure [3j The sizes of each mesh are listed in 
Table [TJ The largest mesh consists of over one million degrees of freedom. The problem was solved 
for each mesh size on up to 128 processors. 

Table 1. Mesh Sizes in Degrees of Freedom (DOF) for Parallel Results 



Mesh 


Coarse DOF 


Fine DOF 


Total DOF 


A 


34,591 


167,289 


201,880 


B 


61,697 


296,496 


358,193 


C 


95,803 


457,476 


553,279 


D 


13,5571 


644,676 


780,247 


E 


224,243 


1,061,157 


1,285,400 



The parallel speedup for each mesh is shown in Figure [TOl We measure the speedup as the ratio of 

the parallel execution time over the serial execution time. Notice that Meshes D and E superlinear 

speedup is obtained, while for Meshes A, B, and C, the speedup is less than linear. These results 

correspond with the expected performance as the problem size increases. For a larger problem size, 

the ratio of computation time to communication time becomes smaller, allowing for more efficient 

computation. For a fixed problem size, no algorithm is scalable in the sense that eventually the 
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communication costs will far exceed the computation cost as the number of processors increases. In 
order to better gage the parallel features of the Newton-Schur solution approach, an isoefficiency 
study was performed. Isoefficiency contours characterize the problem size required for a given 
number of processors to achieve a particular value of efficiency. By efficiency, we are referring to 
the ratio of the serial costs over the parallel costs. For a given serial computation time, Ti, and 
parallel time, Tp, for p processors, the efficiency, E, is given as 

Figure [TT] shows the isoefficiency contours for each of the meshes in Table [H An algorithm that is 
poorly scalable will have almost vertical isoefficiency contours, which correspond to a much greater 
problem size required to obtain a particular efficiency for a given number of processors [29] . Notice 
that for greater than 32 processors, the isoefficiency contours are almost horizontal which suggests 
that to sustain a certain amount of efficiency for a greater number of processors does not require a 
substantially larger problem size. The isoefficiency contours in Figure [TT] reveal that the Newton- 
Schur solution approach is very scalable. Good parallel performance is obtained for a reasonable 
problem size. 

Table 2. Summary of Parallel Results 



Mesh 


Max. Efficiency 
Obtained (%) 


Theoretical 
Speedup 


Speedup 
Obtained 


A 


86 


27 


22 


B 


86 


42 


38 


C 


92 


146 


75 


D 


122 




175 


E 


135 




103 



The parallel performance data is summarized in Table [2j Values for the theoretical speedup 
were obtained using Amdahl's law, which estimates the speedup based on the serial fraction of the 
computation. The speedup cannot exceed the inverse of the serial fraction. For meshes D and E 
Amdahl's law is not appropriate because the efficiency is greater than 100%. This may be due to the 
fact that for such a large problem size, the data does not fit in the cache memory on one processor 
leading to an extremely long serial computation time. Once the problem has been partitioned over 
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several processors, the data fits into the cache which results in an efficiency greater than 100%. 
The superlinear speedup obtained for meshes D and E is also explained by this circumstance. 

It is important to note that the number of unknowns is greatly increased by the addition of the 
fine scale variables. For example, for the lid-driven cavity problem, the total number of degrees 
of freedom for the most refined mesh is 1,285,400 whereas the number of coarse-scale degrees 
of freedom is 224,243 which makes up only 21% of the total. By using the Schur's complement 
implementation the initial problem size is reduced by roughly 80%. The scalability of the algorithm 
helps to offset the extra computational cost produced by the dual scales. Although the problem 
size is much greater, the parallel performance is improves as the problem size grows. Also, the 
fine-scale problem may be further partitioned due to the uncoupled nature of the fine-scale terms. 

Remark 2. This example also illustrates the increased cost associated with using the variational 
multiscale framework. The computational cost associated with the variational multiscale framework 
is approximately four times the cost of solving the coarse-scale problem alone. 

7. CONCLUSIONS 

In summary, we have investigated the performance a Schur's complement implementation of 
a consistent linearization of the incompressible Navier-Stokes equations for a variety of three- 
dimensional problems. We obtained high Reynolds solutions to the lid-driven cavity problem with- 
out the use of a time-marching scheme. The results obtained for Re = 800 suggest that instabilities 
in the flow, which represent the onset of the transition to turbulent flow may occur at Reynolds 
numbers less than 1000. We also showed that three dimensional effects may be present for the jet 
flow through an orifice problem. The results reveal a number of interesting features of the varia- 
tional multiscale framework. For example, for large problems the dual scales introduce a significant 
amount of additional degrees of freedom. For the lid-driven cavity problem, the fine scale variable 
comprised over 80% of the total degrees of freedom. Regarding the parallel performance of the 
Newton-Schur solution approach, we were able to achieve superlinear speedup for meshes with 
greater than one million degrees of freedom. By means of an isoefficiency study, we also showed 
that the Newton-Schur solution algorithm is scalable for reasonable problem sizes. 
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Figure 1 . Body force driven cavity: velocity in the x-direction along the centerline 
of the cavity for the Newton-Schur (NS) and Newton-Raphson (NR) solution ap- 
proaches. 
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Figure 2. Body force driven cavity: computational time in seconds comparing 
the Newton-Schur (NS) and Newton-Raphson (NR) solution approaches for a given 
number of processors and a fixed problem size of 200,000 degrees of freedom. 
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Figure 3. 3D lid-driven cavity: problem description and boundary conditions. All 
boundaries are given no-slip wall conditions {vx = Vy = = 0) except for the top 
boundary which is given a unit velocity in the x-direction and zero velocity in y and 
z. The pressure at one node at the center of the domain is prescribed as zero. 




Figure 4. 3D lid-driven cavity: velocity in the x-direction as measured along the 
center of the cavity in the z-direction (x = 0, y = 0, z) compared to the results 
obtained in [5]. 
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Figure 5. 3D lid-driven cavity: numerical results. The left column shows vector 
plots of the velocity, the middle column shows the velocity magnitude, and the right 
column shows pressure contours in the x-y plane (top row), x-z plane (middle row), 
and y-z planes (bottom row). 

Dr. Kalyana Babu Naskshatrala, Department of Civil and Environmental Engineering, 2524 Hy 

DROSYSTEMS LABORATORY, UNIVERSITY OF ILLINOIS AT URBANA-ChAMPAIGN, URBANA, ILLINOIS - 61801. 
E-mail address: nakshatr@illinois.edu 

Professor Keith D Hjelmstad, Department of Civil and Environmental Engineering, 3129e New 
MARK Laboratory, University of Illinois at Urbana-Champaign, Urbana, Illinois - 61801. 
E-mail address: kdh@illinois.edu 



18 







Figure 6. 3D jet problem: problem description and boundary conditions. All 
boundaries are given traction free conditions (<t = 0) except for the no-slip wall 
in which the orifice lies. The velocity is prescibed as zero in all directions on the 
no-slip wall. Over the orifice a parabolic velocity is prescibed in the x-direction with 
maximum magnitude of 1.0 and zero along the edges of the orifice. The velocities 
in y and z over the orifice are zero. 




Figure 7. 3D jet problem: pressure contours with a slice removed to show detail 
at time t = 1.0s. The pressure contours are moving forward and spreading sideways. 
Also note the large region of high pressure that forms in front of a spreading region 
of low pressure. 




Figure 8. 3D jet problem: velocity magnitude contours and streamtraces at time 
t = 1.0s. The mushroom-shaped vortex formed by the jet at the orifice is clearly 
shown by the streamtraces. 
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Figure 10. Parallel speedup for various mesh sizes. For the number of degrees of 
freedom in each mesh, see Table [TJ 
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Figure 11. Isoefficiency contours for the Newton-Schur solution approach. 
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